Cleaning Your Wrong Google Scholar Entries

نویسندگان

  • Shuang Hao
  • Yi Xu
  • Nan Tang
  • Guoliang Li
  • Jianhua Feng
چکیده

Entity categorization – the process of grouping entities into categories for some specific purpose – is an important problem with a great many applications, such as Google Scholar and Amazon products. Unfortunately, many real-world categories contain mis-categorized entities, such as publications in one’s Google Scholar page that are published by the others. We have proposed a general framework for a new research problem – discovering mis-categorized entities. In this demonstration, we have developed a Google Chrome extension, namely GSCleaner, as one important application of our studied problem. The attendees will have the opportunity to experience the following features: (1) mis-categorized entity discovery – The attendee can check mis-categorized entities on anyone’s Google Scholar page; and (2) Cleaning onsite – Any attendee can login and clean his Google Scholar page using GSCleaner. We describe our novel rule-based framework to discover mis-categorized entities. We also propose effective optimization techniques to apply the rules. Some empirical results show the effectiveness of GSCleaner on discovering mis-categorized entities. Keywords-mis-categorized entity; Google Scholar cleaner; rule-based framework; signature

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Quality Not Your Typical Database Problem

Textbook database examples are often wrong and simplistic. Unfortunately Data is never born clean or pure. Errors, missing values, repeated entries, inconsistent instances and unsatisfied business rules are the norm rather than the exception. Data cleaning (also known as data cleansing, record linkage and many other terminologies) is growing as a major application requirement and an interdiscip...

متن کامل

A bibliometric study of Video Retrieval Evaluation Benchmarking (TRECVid): A methodological analysis

This paper provides a discussion and analysis of methodological issues encountered during a scholarly impact and bibliometric study within the field of computer science (TRECVid Text Retrieval and Evaluation Conference, Video Retrieval Evaluation). The purpose of this paper is to provide a reflection and analysis of the methods used to provide useful information and guidance for those who may w...

متن کامل

Optimize Your Article for Search Engine

This article provides guidelines on how to optimize scholarly literature for academic search engines like Google Scholar, in order to increase the article visibility and citations.

متن کامل

Using "Cited by" Information to Find the Context of Research Papers

This paper proposes a novel method of analyzing data to find important information about the context of research papers. The proposed CCTVA (Collecting, Cleaning, Translating, Visualizing, and Analyzing) method helps researchers find the context of papers on topics of interest. Specifically, the method provides visualization information that maps a research topic’s evolution and links to other ...

متن کامل

Hunter X Scholar – Finger out Famous Men in Your Research Area

As the growth of the WWW, scientists and researchers publishing their research information on the web may become an essential comportment in academia, an enormous number of web pages provide information on scientists, research papers, and technical documents in the Internet and indexed by search engines. For a junior student or junior researcher, it is a nontrivial task to know/search authorita...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018